Biomarker characterization of clinical subtypes of Parkinson Disease

The biological underpinnings of the PD clusters remain unknown as the existing PD clusters lacks biomarker characterization. We try to identify clinical subtypes of Parkinson Disease (PD) in an Asian cohort and characterize them by comparing clinical assessments, genetic status and blood biochemical markers. A total of 206 PD patients were included from a multi-centre Asian cohort. Hierarchical clustering was performed to generate PD subtypes. Clinical and biological characterization of the subtypes were performed by comparing clinical assessments, allelic distributions of Asian related PD gene (SNCA, LRRK2, Park16, ITPKB, SV2C) and blood biochemical markers. Hierarchical clustering method identified three clusters: cluster A (severe subtype in motor, non-motor and cognitive domains), cluster B (intermediate subtype with cognitive impairment and mild non-motor symptoms) and cluster C (mild subtype and young age of onset). The three clusters had significantly different allele frequencies in two SNPs (Park16 rs6679073 A allele carriers in cluster A B C: 67%, 74%, 89%, p = 0.015; SV2C rs246814 T allele distribution: 7%, 12%, 25%, p = 0.026). Serum homocysteine (Hcy) and C-reactive protein (CRP) levels were also significantly different among three clusters (Mean levels of Hcy and CRP among cluster A B C were: 19.4 ± 4.2, 18.4 ± 5.7, 15.6 ± 5.6, adjusted p = 0.005; 2.5 ± 5.0, 1.5 ± 2.4, 0.9 ± 2.1, adjusted p < 0.0001, respectively). Of the 3 subtypes identified amongst early PD patients, the severe subtype was associated with significantly lower frequency of Park16 and SV2C alleles and higher levels of Hcy and CRP. These biomarkers may be useful to stratify PD subtypes and identify more severe subtypes.


INTRODUCTION
Parkinson's disease (PD) is the most common hypokinetic movement disorder with significant heterogeneity in symptoms and outcomes. Non-Motor symptoms (NMS) resulting from various neurotransmitter pathway dysfunctions affects both the central and peripheral nervous systems, which contribute to PD heterogeneity 1,2 . Subtype identification has been established as one of the top three clinical research priorities in the field of PD 3 . Identification of PD subtype could be valuable in revealing the underlying etiology and understanding the disease course. More importantly, PD subtyping could guide the design of clinical trial and future personalized PD treatment.
Cluster analysis, a data-driven approach could help to define the disease phenotypes. Most studies use cluster analysis to stratify PD subtypes based on clinical data, such as motor, NMS and demographic features 4,5 . These studies have however been limited by the inclusion of PD patients from different disease stages; absence of genetic data that may influence clinical heterogeneity; and limited analysis of Asian cohorts.
The biological underpinnings of the PD clusters remain unknown as the existing multidimensional data-driven derivation of PD clusters lacks biomarker characterisation. PD biomarkers including clinical, blood, cerebrospinal fluid (CSF) and imaging biomarkers have been playing increasingly important roles in early diagnosis and disease prognostication 6 . Blood biomarkers may have wider implications than CSF and imaging biomarkers as they are more accessible at lower cost 7 .
Homocysteine (Hcy) and C-reactive protein (CRP) are two blood biochemical biomarkers that are associated with PD severity. Severe PD subtypes have been found to have significantly higher levels of CRP 8 while elevated plasma Hcy level was found in depressed and cognitively impaired PD patients 9 . Vitamin D, uric acid(UA) and lipids are thought to play important neuroprotective roles in PD. Studies have shown that higher serum UA levels were associated with a lower risk of PD development 10 and more benign prognosis in PD patients 11 . Vitamin D deficiency is common in PD patients 12 and lower vitamin D levels have been associated with worse prognosis in PD 13 . In addition, serum lipid biomarkers were reduced in PD patients compared to healthy controls [14][15][16][17] .
To understand the clinical heterogeneity of PD, we used cluster analysis to search for subtypes in a multi-centre, Asian early PD cohort. The aims of our study are: (1) To identify distinct PD clusters from a comprehensive dataset; (2) To provide clinical and biological features of the identified subtypes by comparing the clinical characteristics, allelic distributions of Asian related PD genes, and blood biochemical markers. our cohort. A summary of patient demographics and clinical characteristics was shown in Table 1. The comparison of  comorbidities among three clusters can be found in Supplementary Table 1.

Cluster analysis results
Three independent PD clusters were identified from hierarchical cluster analysis (Fig. 1). The features comparison among three clusters can be seen in Table 2. The three clusters had significant differences in age of diagnosis (mean age of diagnosis of cluster A, B, C was 69.6 ± 7.9, 63.6 ± 7.4, 59.4 ± 9.7 years, p < 0.001, respectively). The three clusters also differed significantly in all cognitive domain scores, most of the motor scores (MDS-UPDRS part II, III score, tremor score, PIGD score) and most NMS items (MDS-UPDRS part I score, systolic BP drop, ESS total score and HADS depression score). There were no significant differences in terms of PRS, HADS anxiety score and RBD1Q among the three clusters.
Cluster A was severe subtype in motor, NMS and cognition, which comprised 43 (20.9%) PD patients. Cluster A was the most severe in terms of motor, NMS and cognition domains as supported by the highest MDS-UPDRS part I, II, III scores, PIGD score, cognitive domain scores and depression score. Significant BP drop was the most common in cluster A (35% of the patients in cluster A had significant BP drop vs 21% and 11% in clusters B and C, p = 0.010).
The second cluster (cluster B) was the largest cluster with 98 subjects, consisting of 47.6% of PD patients. Cluster B was the intermediate subtype characterized by cognitive impairment with mild NMS. Cognitive domain scores in this cluster were moderate, ranging between cluster A and C. However, patients in this cluster had very mild NMS impairment supported by the lowest MDS-UPDRS part I score (2.5 in cluster B vs 6 and 4 in cluster A and C, p < 0.001).
There were 65 patients in cluster C, accounting for 31.6% of the PD patients. Cluster C was a mild subtype with a younger age of onset. The mean age of diagnosis of cluster C was significantly younger than the other two clusters (59.4 ± 9.7 in cluster C vs 63.6 ± 7.4, 69.6 ± 7.9 in cluster B and A, p < 0.001). Cluster C had good cognitive performance supported by the highest cognitive domain scores. Significant BP drop was not common in the cluster, with only 7 patients having significant systolic blood pressure drop. The other non-motor and motor profiles in cluster C were relatively mild.

Characterization of PD subtypes using clinical biomarkers
Clinical variables that were not included in the clustering model were used for post-hoc comparison among the clusters ( Table 3). The 3 clusters remained significantly different with regard to MCI rate, MoCA Score, and most of the NMSS domain scores, except domain 4 (perceptual problems) and domain 8 (sexual function), after correction for multiple comparison.
Cluster A consistently had significantly worse performance in all profiles, including highest MCI percentage (81%) and NMSS total score. Cluster B had obvious cognitive impairment with mild NMS and was characterized by having a moderate percentage of MCI (64%) and the lowest NMSS total score (11 vs 26 and 16 in cluster A and C, p < 0.001, q < 0.0019). Cluster C was a mild subtype and was characterized by having the lowest MCI percentage (15%, 64%, 81%, for cluster C, B, A respectively, p < 0.001, q < 0.0019).

Characterization of PD subtypes using blood biomarkers
Allelic distributions of Asian related PD genes in three PD clusters. A total of 206 PD patients were genotyped. The Park16 rs6679073 A allele frequency was 76.7% (158 A allele carriers, including 77 patients carried AA and 81 patients harboured AC), the SV2C rs246814 T allele frequency was 15.0 % (31 T allele carriers, including 2 patient carried TT and 29 patients carried TC).
The three clusters had significantly different effect allele frequency in these two SNPs (distribution of Park16 rs6679073 A allele carriers in cluster A B C: 67%, 74%, 89%, p = 0.015, q = 0.065; SV2C rs246814 T allele distribution: 7%, 12%, 25%, p = 0.026, q = 0.065; Table 4). Cluster A (severe subtype in motor, NMS and cognitive domains) had the lowest percentage of both Park16 and SV2C effect allele, while cluster C (mild subtype and young age of onset) contained the largest number of the carriers of these two SNPs.
Comparison of blood biochemical markers among three clusters. We found significant differences in Hcy and CRP levels among three clusters in the generalized linear model after adjustment for age and sex. Highest levels of Hcy and CRP were present in Cluster A (severe subtype in motor, NMS and cognitive domains), while lowest levels were shown in Cluster C (mild subtype and young age of onset). The differences of Hcy and CRP levels among three cluster were remained significant after adjustment for multiple comparisons. Mean levels of Hcy among three clusters were: 19.4 ± 4.2, 18.4 ± 5.7, 15.6 ± 5.6, p = 0.001, q = 0.005; while the mean levels of CRP were: 2.5 ± 5.0, 1.5 ± 2.4, 0.9 ± 2.1, p = 0.000, q < 0.0001 ( Table 5). The comparison of Hcy and CRP levels among  three clusters remained significant after adjustment for age of diagnosis, sex and significant comorbidities including hypertension, hyperlipidemia, lipid medication and hypertension medication. Please refer to Supplementary Table 2.

DISCUSSION
In this study, 206 early PD patients who were recruited within 1 year from diagnosis were assigned to three clusters by an unbiased data-driven hierarchical cluster analysis: cluster A (severe subtype in motor, NMS and cognitive domains), cluster B (intermediate subtype with cognitive impairment and mild NMS) and cluster C (mild subtype and young age of onset). Despite similar disease durations, the three clusters presented with substantially different clinical features and blood biomarker (genetic markers and biochemical markers) profiles. The significantly different allele frequencies in two SNPs (Park16 rs6679073 A allele and SV2C rs246814 T allele), suggest that these may be important genetic biomarkers for PD subtypes. We also found Hcy and CRP to be promising biomarkers to identify the severe PD subtype. These findings contribute to our understanding of PD heterogeneity, especially among Asian PD. Various PD subtypes have been identified through cluster analysis in previous studies. The diffuse malignant cluster previously reported by Fereshtehnejad in two different studies 4,18 is most akin to cluster A in our study. Cluster A was severe in all disease domains including motor, NMS and cognition. The underlying mechanism of this severe cluster most likely lies in simultaneous involvement of dopaminergic and non-dopaminergic pathways at an early disease stage 18 .
Previous studies reported that the most critical determinants of PD subtype were UPDRS, cognitive status, RBD, and orthostatic hypotension 4,18 . In our study, cluster A was best defined by MDS-UPDRS part I, II, III scores, cognitive impairment and significant BP drop, suggesting that the most critical drivers for PD subtype in our cohort are consistent with previous reports. However, RBD was not found to be an effective clinical determinant for PD subtyping in our cohort. Our study had a low RBD detection rate, which is likely attributable to the use of RBD1Q to detect RBD rather than use of gold standard overnight polysomnography assessment. In addition, the MCI percentage in our cohort was higher than that in PPMI cohort reported by Weintraub 19 . Older age of diagnosis, lower education year and different ethnic population in our cohort may contribute to the difference. The identification of the severe cluster and its critical clinical drivers would enable clinicians to identify PD patients with a more severe subtype at an early disease stage.
Besides the severe cluster, there were two comparatively more benign PD clusters in our cohort. Cluster C was characterised by young onset with generally better performance in all disease domains. This finding is consistent with previous studies 5,20 that have identified a mild PD subtype with young onset.
Another comparatively benign PD cluster in our cohort was cluster B, comprising 47.6% of the PD patients, with the key features of cognitive impairment and mild NMS. Cluster B is a unique subtype in our cohort with the cognitive performance and NMS scores found to be in opposing directions from each other. The mechanism of cognitive impairment in PD is not fully understood. Acetylcholine neurotransmitter dysfunction is one of the possible pathways 21 . Muller et al reported that cognitive impairment alone in PD patients was related to isolated cortical cholinergic deficits, while a combination of cognitive decline, falls and RBD correlated with thalamic and cortical cholinergic deficiency 22 . The features of cluster B in our cohort suggest that the underlying affected brain areas of cognitive impairment in PD patients might be heterogenous.
When characterizing genetic markers in the three PD subtypes, we found that they had significantly different allele frequencies in two SNPs even though the composite genetic score was not significantly different among three clusters. The mild cluster had significantly higher frequencies of the Park16 rs6679073 A allele and SV2C rs246814 T allele, indicating that these two SNPs may have potential neuroprotective effects in our Asian cohort.
Park16 SNPs has been consistently reported to play a protective role of PD development in different populations 23,24 . However, there is little information about the clinical characteristics of Park16 carriers. Our study found that the number of Park16 carriers was highest in the mild cluster (Park16 rs6679073 A allele frequency in mild cluster was similar with        9 reported that synaptic vesicle glycoprotein 2C (SV2C) was a novel gene having robust association with PD development in various populations. It was also reported that SV2C was a functional PD candidate gene and an important mediator of dopamine homeostasis. Genetic deletion of SV2C caused a reduction of dopamine release, resulting in a decrease in motor activity 26 . Our findings corroborate the possible neuroprotective effects of SV2C, as the severe cluster had the lowest percentage of SV2C, while the mild cluster had the highest number of the SV2C carriers.
Our characterization of blood biochemical markers and PD clusters found Hcy to be a promising biomarker for the severe PD subtype after adjusting for confounders and multiple comparisons. Previous evidence have found elevated blood Hcy levels to be associated with cognitive impairment in PD patients 9,27 . However, to our best knowledge, Hcy has not been previously reported to be associated with PD severity. The robust relationship between elevated Hcy levels and severe PD subtype may open new strategies for PD treatment. Since the accelerated rate of brain atrophy in the elderly with MCI have been found to be slowed by treatment with homocysteinelowering B vitamins 28 , it is worth investigating whether PD severity can be ameliorated by adding vitamin supplementation to lower the Hcy levels. In addition to Hcy, we found CRP to be another reliable biomarker for the severe PD subtype, a finding that is in agreeable with a previous report 8 . These findings lend evidence for the existence of an heightened inflammatory state in severe PD subtypes.
Previous studies have reported that lipids have a neuroprotective effect on PD development. However, it is still controversial if lipid markers are associated with specific PD subtypes. We found that clusters with more than 60% MCI incidence (both clusters A and B) had significantly higher TG level, consistent with our previous finding that higher TG levels were related to cognitive impairment 29 . Our results also showed that the severe cluster (cluster A) had lowest TC levels. However, these associations were not significant after adjusting for multiple comparisons. Lawton et al recently reported that the severe motor disease phenotype, poor psychological well-being, and poor sleep subtype was  Mean ± standard deviation *Generalized linear model was applied to compare the biomarkers against different clusters and adjusted for age of diagnosis, sex. **False discovery rate (FDR) method was performed and q values were calculated to control for multiple testing and the threshold of q values was set as 0.1. Hcy: Homocysteine, CRP C-reactive protein, Vit D3 Vitamin D3, UA Uric acid; TC Cholesterol; TG Triglyceride; HDL-C high-density lipoprotein cholesterol, Apo A1 apolipoprotein A1, LDL-C low-density lipoprotein cholesterol, Apo B apolipoprotein B.
X. Deng et al.
associated with reduced Apo A1 levels 8 . Our study, however, failed to reproduce this association. We were also unable to find out any significant correlation between our PD subtypes and Vitamin D or UA levels. Our PD clusters were generated from an Asian cohort with all PD patients recruited within 1 year from diagnosis, which ensures that the cluster features were not driven by different disease durations and stages. In addition, we performed cluster analysis by including genetic status, that enabled us to investigate PD heterogeneity at the genetic level. Our study also tries to assess the association between a broad list of blood biomarkers (genetic markers and serum biochemical markers) and PD clusters, which provides comprehensive biological characterization for the newly generated clusters. However, some limitations of the study should be noted. The current study was cross-sectional and blood biomarkers were not monitored overtime. Longitudinal followup of these PD subtypes to monitor their biomarkers and disease progression will be needed. Furthermore, this was a single cohort study with limited sample size that requires further validation in other populations.
In summary, we introduced three subtypes of early PD patients in a multi-centre Asian cohort: 'severe', 'intermediate' and 'mild young-onset' subtypes. The severe subtype was associated with significantly lower frequency of Park16 and SV2C alleles; and had significantly higher levels of serum Hcy and CRP. Park16, SV2C, Hcy and CRP may be useful biomarkers to stratify PD patients into disease subtypes. Our findings also shed light on the possible underlying mechanisms that account for PD heterogeneity. This will improve the stratification of PD patients into disease subtypes that will enable more targeted personalised treatment strategies. Further validation of the genetic and biochemical differences between subtypes in larger cohorts and evaluation of their impact on PD progression is warranted.

Participants and enrolment
Study population. A total of 206 idiopathic early PD patients defined by National Institute of Neurological Disorders and Stroke (NINDS) diagnostic criteria have been recruited from Early Parkinson's disease Longitudinal Singapore (PALS) cohort based on the inclusion and exclusion criteria of PALS study protocol 30 . PALS is an ongoing prospective cohort study undertaken to investigate the disease course of early PD patients who were recruited within 1 year of diagnosis.
Enrolment. Our study was conducted at two movement disorder outpatient clinics(Singapore General Hospital and Tan Tock Seng Hospital) in Singapore. Our study has been approved by SingHealth Centralized Institutional Review Board (CIRB) with Ref 2019/2433 and written informed consent was provided by all participants.

Data collection
Comprehensive clinical features (motor, NMS and cognitive domains) and blood biomarkers were collected and used in the study. All clinical assessments were performed while patients were on their PD medications.
NMS: MDS-UPDRS Part I score (Non-Motor Aspects of Experiences of Daily Living) and Non-motor symptom scale (NMSS) total score were used to assess NMS burden; NMSS 33 35 , in which cognitive impairment should be present in at least two neuropsychological tests with 1.5 standard deviations (SDs) worse than norms as cut offs, either within a single cognitive domain or across different cognitive domains.
Others: Blood pressure was measured both in the supine position and after 3 min of standing. Orthostatic drop in Systolic Blood Pressure(SBP) greater than 10 mmHg was considered significant BP drop and viewed as an objective measure of autonomic dysfunction 4 . We also collected demographic data including sex, age of diagnosis, ethnicity.
Blood biomarkers assessments. We genotyped variants of SNCA, LRRK2, Park16, ITPKB, SV2C using Illumina Infinium Global Screening Array − 24 v2.0. The PRS was defined as the sum of the number of risk alleles per individual weighted by their effect size estimate corresponding to the logarithm of the odds ratio 36 . In the current study we calculated PRS by comprising 5 Asian GWAS SNPs (SNCA, LRRK2, Park16, ITPKB, SV2C) with the highest effect size and p level less than the genome wide significant association level (5*10 −8 ) from the latest Asian GWAS meta-analysis 37 to provide quantitative data of genetic burden individually. The SNPs data being used for PRS calculation can be found in supplementary data.
We tested 10 commercially available blood biomarkers. They are homocysteine (Hcy), C-reactive protein (CRP), vitamin D, uric acid(UA) and lipid markers including Triglyceride (TG), total cholesterol (TC), highdensity lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C) Apolipoprotein A1 (Apo A1) and Apolipoprotein B (Apo B). Blood biomarkers were measured using overnight fasting venous serum sample and were determined by enzymatic assay in a professional medical laboratory (Quest Laboratories Pte Ltd, Singapore).

Statistical methods
Cluster analysis. Cluster analysis was performed in Python Software version 3 (http://www.python.org). Seventeen variables (Age of diagnosis, PRS, Number of patients having significant BP drop, MDS-UPDRS Part II score, MDS-UPDRS Part III score, tremor score, PIGD score, MDS-UPDRS Part I score, ESS Total Score, HADS Anxiety Total score, HADS Depression Total score, RBD1Q, Memory score, Visuospatial score, Attention score, Language score, Executive score) were selected by expert opinion and contemporary evidence 18 . All variable measurements were standardized by using the Z-scores for the cluster analysis. Agglomerative hierarchical clustering with Euclidean distance calculation was applied. We selected the three-cluster solution due to more balanced data distribution and better clinical interpretation. Missing value pattern was identified as missing by random. Hence, single imputation approach was used to impute 92 (2.6%) missing values in the baseline variables. College Station, TX: Stata Corp LLC) and SAS OnDemand for Academics (SAS Institute Inc. 2014. SAS® OnDemand for Academics: User's Guide. Cary, NC: SAS Institute Inc.). Continuous variables were summarized using mean with standard deviation (SD) or median with first and third quartile. Categorical variables were summarized by frequencies and percentages. Demographics, clinical characteristics not included in cluster analysis and allelic distributions of related PD genes were compared among clusters. Fisher's exact test or Pearson Chi square test (where appropriate) was carried out to compare the categorical variables among different clusters; while one-way ANOVA or Kruskal-Wallis tests (depends whether normality assumption was tenable) was performed to compare continuous variables among different clusters.
Blood biochemical markers comparisons among clusters. Blood biochemical markers comparisons were performed in SAS OnDemand for Academics (SAS Institute Inc. 2014. SAS® OnDemand for Academics: User's Guide. Cary, NC: SAS Institute Inc.). All blood biochemical markers except CRP were log-transformed to reduce the right-skewness. Generalized linear model was performed to compare the biomarkers among different clusters and adjusted for age of diagnosis, sex, using normal distribution assumption for the outcome variable. Gamma distribution was assumed for CRP due to the skewed distribution even after log-transformation. False discovery rate (FDR) method 38 was performed to control for multiple testing comparison and q value was calculated. We set the threshold of q values as 0.1.

DATA AVAILABILITY
The data collected during this study are available from the corresponding author upon reasonable request from qualified individuals.